47 research outputs found

    Transforming the study of organisms: Phenomic data models and knowledge bases

    Get PDF
    The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem

    Methods for broad-scale plant phenology assessments using citizen scientists’ photographs

    Get PDF
    © 2020 Barve et al. Applications in Plant Sciences is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America Premise: Citizen science platforms for sharing photographed digital vouchers, such as iNaturalist, are a promising source of phenology data, but methods and best practices for use have not been developed. Here we introduce methods using Yucca flowering phenology as a case study, because drivers of Yucca phenology are not well understood despite the need to synchronize flowering with obligate pollinators. There is also evidence of recent anomalous winter flowering events, but with unknown spatiotemporal extents. Methods: We collaboratively developed a rigorous, consensus-based approach for annotating and sharing whole plant and flower presence data from iNaturalist and applied it to Yucca records. We compared spatiotemporal flowering coverage from our annotations with other broad-scale monitoring networks (e.g., the National Phenology Network) in order to determine the unique value of photograph-based citizen science resources. Results: Annotations from iNaturalist were uniquely able to delineate extents of unusual flowering events in Yucca. These events, which occurred in two different regions of the Desert Southwest, did not appear to disrupt the typical-period flowering. Discussion: Our work demonstrates that best practice approaches to scoring iNaturalist records provide fine-scale delimitation of phenological events. This approach can be applied to other plant groups to better understand how phenology responds to changing climate

    Machine learning using digitized herbarium specimens to advance phenological research

    Get PDF
    Machine learning (ML) has great potential to drive scientific discovery by harvesting data from images of herbarium specimens—preserved plant material curated in natural history collections—but ML techniques have only recently been applied to this rich resource. ML has particularly strong prospects for the study of plant phenological events such as growth and reproduction. As a major indicator of climate change, driver of ecological processes, and critical determinant of plant fitness, plant phenology is an important frontier for the application of ML techniques for science and society. In the present article, we describe a generalized, modular ML workflow for extracting phenological data from images of herbarium specimens, and we discuss the advantages, limitations, and potential future improvements of this workflow. Strategic research and investment in specimen-based ML methods, along with the aggregation of herbarium specimen data, may give rise to a better understanding of life on Earth

    Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies

    Get PDF
    The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers

    Phenotypic Switching of Nonpeptidergic Cutaneous Sensory Neurons following Peripheral Nerve Injury

    Get PDF
    In adult mammals, the phenotype of half of all pain-sensing (nociceptive) sensory neurons is tonically modulated by growth factors in the glial cell line-derived neurotrophic factor (GDNF) family that includes GDNF, artemin (ARTN) and neurturin (NRTN). Each family member binds a distinct GFRα family co-receptor, such that GDNF, NRTN and ARTN bind GFRα1, -α2, and -α3, respectively. Previous studies revealed transcriptional regulation of all three receptors in following axotomy, possibly in response to changes in growth factor availability. Here, we examined changes in the expression of GFRα1-3 in response to injury in vivo and in vitro. We found that after dissociation of adult sensory ganglia, up to 27% of neurons die within 4 days (d) in culture and this can be prevented by nerve growth factor (NGF), GDNF and ARTN, but not NRTN. Moreover, up-regulation of ATF3 (a marker of neuronal injury) in vitro could be prevented by NGF and ARTN, but not by GDNF or NRTN. The lack of NRTN efficacy was correlated with rapid and near-complete loss of GFRα2 immunoreactivity. By retrogradely-labeling cutaneous afferents in vivo prior to nerve cut, we demonstrated that GFRα2-positive neurons switch phenotype following injury and begin to express GFRα3 as well as the capsaicin receptor, transient receptor potential vanilloid 1(TRPV1), an important transducer of noxious stimuli. This switch was correlated with down-regulation of Runt-related transcription factor 1 (Runx1), a transcription factor that controls expression of GFRα2 and TRPV1 during development. These studies show that NRTN-responsive neurons are unique with respect to their plasticity and response to injury, and suggest that Runx1 plays an ongoing modulatory role in the adult

    The trouble with triplets in biodiversity informatics: a data-driven case against current identifier practices.

    No full text
    The biodiversity informatics community has discussed aspirations and approaches for assigning globally unique identifiers (GUIDs) to biocollections for nearly a decade. During that time, and despite misgivings, the de facto standard identifier has become the "Darwin Core Triplet", which is a concatenation of values for institution code, collection code, and catalog number associated with biocollections material. Our aim is not to rehash the challenging discussions regarding which GUID system in theory best supports the biodiversity informatics use case of discovering and linking digital data across the Internet, but how well we can link those data together at this moment, utilizing the current identifier schemes that have already been deployed. We gathered Darwin Core Triplets from a subset of VertNet records, along with vertebrate records from GenBank and the Barcode of Life Data System, in order to determine how Darwin Core Triplets are deployed "in the wild". We asked if those triplets follow the recommended structure and whether they provide an easy and unambiguous means to track from specimen records to genetic sequence records. We show that Darwin Core Triplets are often riddled with semantic and syntactic errors when deployed and curated in practice, despite specifications about how to construct them. Our results strongly suggest that Darwin Core Triplets that have not been carefully curated are not currently serving a useful role for relinking data. We briefly consider needed next steps to overcome current limitations
    corecore